Efficient, Accurate and Privacy-Preserving Data Mining for Frequent Itemsets in Distributed Databases
نویسندگان
چکیده
Mining distributed databases is emerging as a fundamental computational problem. A common approach for mining distributed databases is to move all of the data from each database to a central site and a single model is built. This approach is accurate, but too expensive in terms of time required. For this reason, several approaches were developed to efficiently mine distributed databases, but they still ignore a key issue privacy. Privacy is the right of individuals or organizations to keep their own information secret. Privacy concerns can prevent data movement data may be distributed among several custodians, none of which is allowed to transfer its data to another site. In this paper we present an efficient approach for mining frequent itemsets in distributed databases. Our approach is accurate and uses a privacy-preserving communication mechanism. The proposed approach is also efficient in terms of message passing overhead, requiring only one round of communication during the mining operation. We show that our privacy-preserving distributed approach has superior performance when compared to the application of a well-known mining algorithm in distributed databases.
منابع مشابه
Data sanitization in association rule mining based on impact factor
Data sanitization is a process that is used to promote the sharing of transactional databases among organizations and businesses, it alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. It transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved ag...
متن کاملMining Frequent Itemsets in Distorted Databases with Granular Computing
Data perturbation is one popular method to achieve privacy-preserving data mining. However, distorted databases bring enormous overheads to mining algorithms as compared to original databases. In this paper, we present the GrC-FIM algorithm to address the efficiency problem in mining frequent itemsets from distorted databases. Two measures are introduced to overcome the weakness in existing wor...
متن کاملPrivacy-Preserving Mining of Association Rules on Distributed Databases
Data mining techniques can extract hidden but useful information from large databases. Most efficient approaches for mining distributed databases suppose that all of the data at each site can be shared. However, source transaction databases usually include very sensitive information. In order to obtain an accurate mining result on distributed databases and to preserve the private data that is a...
متن کاملEfficient Data Mining for Frequent Itemsets in Dynamic and Distributed Databases
Data Mining is one of the central activities associated with understanding and exploiting the world of digital data. It is the mechanized process of modeling large databases by means of discovering useful patterns. A frequent itemset is a pattern describing a relevant subset of the data, and a collection of frequent itemsets is particularly useful because it is an extremely compact model of the...
متن کاملPrivacy-preserving algorithms for distributed mining of frequent itemsets
Standard algorithms for association rule mining are based on identification of frequent itemsets. In this paper, we study how to maintain privacy in distributed mining of frequent itemsets. That is, we study how two (or more) parties can find frequent itemsets in a distributed database without revealing each party’s portion of the data to the other. The existing solution for vertically partitio...
متن کامل